Assessing Reliability on Annotations (1): Theoretical Considerations
نویسندگان
چکیده
This is the first part of a two-report mini-series focussing on issues in the evaluation of annotations. In this theoretically-oriented report we lay out the relevant statistical background for reliability studies, evaluate some pertaining approaches and also sketch some arguments that may lend themselves to the development of an original statistic. A description of the project background, including the documentation of the annotation scheme at stake and the empirical data collected, as well as results from the practical application of the relevant statistics and the discussion of our respective results are contained in the second, more empirically-oriented report [Lücking and Stegmann, 2005]. The following points are dealt with in detail here: we summarize and contribute to an argument by Gwet [2001] which indicates that the popular pi and kappa statistics [Carletta, 1996] are generally not appropriate for assessing the degree of agreement between raters on categorical type-ii data. We propose the use of AC1 [Gwet, 2001] instead, since it has desirable mathematical properties that make it more appropriate for assessing the results of expert raters in general. As far as type-i data are concerned, we make use of conventional correlation statistics which, unlike their AC1 and kappa cousins, do not deliver results that are adjusted with respect to agreements due to chance. Furthermore, we discuss issues in the interpretation of the results of the different statistics. Finally, we take up some loose ends from the previous chapters and sketch some advanced ideas pertaining to inter-rater agreement statistics. Therein, some differences as well as common ground concerning Gwet’s perspective and our own stance will be highlighted. We conclude with some preliminary suggestions regarding the development of the original statistic omega that will be different in nature from those discussed before.
منابع مشابه
Assessing Reliability on Annotations (2): Statistical Results for the deikon Scheme
This is the second part of a two-report mini-series focussing on issues in the evaluation of annotations. In this empirically-oriented report we lay out the documentation of the annotation scheme used in the deikon project, discuss the results obtained in a respective reliability study and conclude with some suggestions regarding forthcoming versions of the scheme. Relevant statistical backgrou...
متن کاملThe Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning
In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...
متن کاملAssessment of Infant Movement With a Compact Wireless Accelerometer System
There is emerging data that patterns of motor activity early in neonatal life can predict impairments in neuromotor development. However, current techniques to monitor infant movement mainly rely on observer scoring, a technique limited by skill, fatigue, and inter-rater reliability. Consequently, we tested the use of a lightweight, wireless, accelerometer system that measures movement and can ...
متن کاملContent Analysis Table of Medical Ethics Book Based on Allport’s Theory of Value System
Introduction: Regular assessment of academic textbooks and revision of teaching methods are critical for making such textbooks more efficient in meeting the needs of the new generation and conveying values to them. Therefore, in line with the necessity of textbook evaluation, this research examined the extent to which the Medical Ethics book named “physicians and ethical considerations” observe...
متن کاملVox Populi Annotation: Measuring Intensity of Ideological Perspectives by Aggregating Group Judgments
Polarizing discussions about political and social issues are common in mass media. Annotations on the degree to which a sentence expresses an ideological perspective can be valuable for evaluating computer programs that can automatically identify strongly biased sentences, but such annotations remain scarce. We annotated the intensity of ideological perspectives expressed in 250 sentences by ag...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005